Python Job: Senior Site Reliability Engineer (SRE)

Job added on

Company

nugget.ai

Location

Toronto - Canada

Job type

Full-Time

Python Job Details

We’re building the team and are looking for a Senior Site Reliability Engineer (SRE) in our Toronto office. As an SRE, you will work on one of our infrastructure teams to build and run the core components that power the rest of the organization. You will also partner with our other engineering teams to help make their services more performant, scalable, observable, and reliable. Every engineering team is responsible for the software they build, and SREs play a critical part in providing the tools, practices, and expertise to make that a reality.

What you will do:

  • Develop and promote conventions on production readiness
  • Participate in design reviews and production reviews for new features, products, or pieces of infrastructure
  • Debug production issues across services and levels of the stack, automate
  • Help management with Cost engineering and Capacity Planning
  • Participate in on-call rotations, along with every member of the engineering team
  • Improve common operational challenges with tooling
  • Working with engineering leads to plan for the growth of the infrastructure
  • Design, build, and maintain the core infrastructure used by all of Tim Horton’s engineering teams
  • Design and manage on call and escalation processes

Required Skills:

  • You have 5+ experience as SRE / Operational engineer
  • Think about systems: edge cases, failure modes, behaviors, specific implementations.
  • Experience with AWS Lambda architecture
  • Experience with Configuration Management and Infrastructure as Code - Terraform
  • Strong experience with programming skills: JavaScript, Shell, and Go. (Nice to have Python)
  • Excellent team player with strong interpersonal and communication skills
  • Excellent go-for-it attitude. When you see something broken, you can't help but fix it.
  • Have an urge for delivering quickly and effectively and iterating fast.
  • High degree of curiosity and ownership
  • Worked with monitoring tools and understand best practices for alerting
  • Prior experience managing on-call schedules and escalation processes

Nice to have

  • Mobile application development (React native)